A Keras neural network and the MNIST dataset

Author: Leonardo Espin

Date: 10/2/2019

Below I train a standard feed-forward neural network and a convolutional neural network on the MNIST handwritten digits dataset.

In [1]:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

Loading the MNIST dataset

In [2]:
(x_train_i, y_train), (x_test_i, y_test) = keras.datasets.mnist.load_data()
print(x_train_i.shape)
print(y_train.shape)
print(x_test_i.shape)
print(y_test.shape)
(60000, 28, 28)
(60000,)
(10000, 28, 28)
(10000,)
In [3]:
#check one of the images 
plt.imshow(x_train_i[0]);
print(y_train[0])
5
In [4]:
num_classes=10
img_rows, img_cols = 28,28

Preprocessing the data

In [5]:
#normalize the data to 0-1:
x_train_i = x_train_i/ 255
x_test_i = x_test_i/255

#reshaping for feeding the NN
x_train=x_train_i.reshape(x_train_i.shape[0],img_rows*img_cols,)
x_test=x_test_i.reshape(x_test_i.shape[0],img_rows*img_cols,)

# convert class integers to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print(y_train.shape)
print(y_test.shape)
(60000, 10)
(10000, 10)

Building a standard feed-forward DNN

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
In [7]:
model=Sequential()
#The model needs to know what input shape it should expect
model.add(Dense(25,  #input size 28*28, output is of size 25 to match a hidden layer
                activation='relu',
                input_shape=(img_rows*img_cols,)))

model.add(Dense(25, #number of nodes in this dense layer
                activation='relu'))

#the prediction layer. note that we convert outputs into probabilities
model.add(Dense(num_classes, #number of prediction classes
                activation='softmax'))

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 25)                19625     
_________________________________________________________________
dense_1 (Dense)              (None, 25)                650       
_________________________________________________________________
dense_2 (Dense)              (None, 10)                260       
=================================================================
Total params: 20,535
Trainable params: 20,535
Non-trainable params: 0
_________________________________________________________________

Fitting the DNN model

In [8]:
#configuring the learning process,
model.compile(loss='categorical_crossentropy',#logarithmic loss for multi-class classification
              optimizer='adam',#special version of gradient descent that automatically calculates an
              metrics=['accuracy'])#optimal learning rate for each gradient descent step 
In [9]:
model.fit(x_train, y_train,
          batch_size=100,#number of images for each gradient descent step
          epochs=20,#1-time through the entire data is an epoch, 20 times per image
          validation_split = 0.2)
Train on 48000 samples, validate on 12000 samples
Epoch 1/20
WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7fcabb7768c8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
WARNING: Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7fcabb7768c8> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
48000/48000 [==============================] - 4s 77us/sample - loss: 0.5390 - accuracy: 0.8479 - val_loss: 0.2813 - val_accuracy: 0.9208
Epoch 2/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.2554 - accuracy: 0.9277 - val_loss: 0.2148 - val_accuracy: 0.9402
Epoch 3/20
48000/48000 [==============================] - 3s 58us/sample - loss: 0.2114 - accuracy: 0.9404 - val_loss: 0.1981 - val_accuracy: 0.9445
Epoch 4/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.1839 - accuracy: 0.9473 - val_loss: 0.1816 - val_accuracy: 0.9480
Epoch 5/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1654 - accuracy: 0.9520 - val_loss: 0.1669 - val_accuracy: 0.9525
Epoch 6/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1498 - accuracy: 0.9567 - val_loss: 0.1558 - val_accuracy: 0.9548
Epoch 7/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1369 - accuracy: 0.9603 - val_loss: 0.1516 - val_accuracy: 0.9569
Epoch 8/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.1274 - accuracy: 0.9630 - val_loss: 0.1511 - val_accuracy: 0.9567
Epoch 9/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.1184 - accuracy: 0.9657 - val_loss: 0.1445 - val_accuracy: 0.9584
Epoch 10/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.1102 - accuracy: 0.9675 - val_loss: 0.1455 - val_accuracy: 0.9572
Epoch 11/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.1033 - accuracy: 0.9698 - val_loss: 0.1510 - val_accuracy: 0.9547
Epoch 12/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.0994 - accuracy: 0.9704 - val_loss: 0.1363 - val_accuracy: 0.9608
Epoch 13/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.0916 - accuracy: 0.9728 - val_loss: 0.1370 - val_accuracy: 0.9609
Epoch 14/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0871 - accuracy: 0.9736 - val_loss: 0.1379 - val_accuracy: 0.9605
Epoch 15/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0830 - accuracy: 0.9753 - val_loss: 0.1334 - val_accuracy: 0.9620
Epoch 16/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0794 - accuracy: 0.9762 - val_loss: 0.1306 - val_accuracy: 0.9628
Epoch 17/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.0760 - accuracy: 0.9769 - val_loss: 0.1403 - val_accuracy: 0.9619
Epoch 18/20
48000/48000 [==============================] - 3s 60us/sample - loss: 0.0721 - accuracy: 0.9780 - val_loss: 0.1332 - val_accuracy: 0.9622
Epoch 19/20
48000/48000 [==============================] - 3s 59us/sample - loss: 0.0682 - accuracy: 0.9793 - val_loss: 0.1523 - val_accuracy: 0.9572
Epoch 20/20
48000/48000 [==============================] - 3s 61us/sample - loss: 0.0659 - accuracy: 0.9796 - val_loss: 0.1383 - val_accuracy: 0.9616
Out[9]:
<tensorflow.python.keras.callbacks.History at 0x7fcabb774b00>

Testing the model on unseen data

In [10]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Test loss: 0.13213677672185004
Test accuracy: 0.9637
In [11]:
#the conversion below is to remove the error
#"expected dense_input to have shape (748,)..." 
print(model.predict_classes(np.array([x_test[0]])))

#array dimensions:
print(x_test[0].ndim)
print((np.array([x_test[0]])).ndim)
[7]
1
2
In [12]:
y_test[0]
Out[12]:
array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], dtype=float32)
In [13]:
np.argwhere(y_test[0])
Out[13]:
array([[7]])

Building a convolutional neural network

The NN below as two extra convolutional layers. The effect of these is dramatic, by reducing the ammount of training required from 20 epochs in the FFNN above to 4 epochs below. Despite of this reduction in the amount of training the accuracy increases (starting from the first epoch from 84% to 93%) by two percentage points

In [14]:
from tensorflow.keras.layers import Flatten, Conv2D
In [15]:
cnv_model=Sequential()

cnv_model.add(Conv2D(12,                #number of convolutional filters
                     kernel_size=(3, 3),#shape of convolution kernel
                 activation='relu',
                 input_shape=(img_rows, img_cols, 1)))

#another convolutional layer
cnv_model.add(Conv2D(20,kernel_size=(3, 3),
                        activation='relu'))

#removing an extra convolution layer resulted in slightly improved accuracy O(1e-3)

#the flattening layer converts the output of the previous layers
#into a 1D representation for each image
cnv_model.add(Flatten())

#for some reason this layer has an order of magnitude more parameters than
#the dense leyer in the previous model. Is it due to the extra dimension
#for the image convolutions?
cnv_model.add(Dense(25, #number of nodes in this dense layer
                activation='relu'))
#the prediction layer. note that we convert outputs into probabilities
cnv_model.add(Dense(num_classes, #number of prediction classes
                activation='softmax'))
cnv_model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 12)        120       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 20)        2180      
_________________________________________________________________
flatten (Flatten)            (None, 11520)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 25)                288025    
_________________________________________________________________
dense_4 (Dense)              (None, 10)                260       
=================================================================
Total params: 290,585
Trainable params: 290,585
Non-trainable params: 0
_________________________________________________________________
In [17]:
cnv_model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

#redimensioning the input is necessary because of the convolutions
#the 4th-dimension is for the single color channel (gray-scale images)
cnv_model.fit(x_train_i.reshape(x_train_i.shape[0],img_rows,img_cols,1), y_train,
          batch_size=100,#number of images for each gradient descent step
          epochs=4,      #notice the smaller number of epochs!
          validation_split = 0.2)
Train on 48000 samples, validate on 12000 samples
Epoch 1/4
WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7fca6caac840> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
WARNING: Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7fca6caac840> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
48000/48000 [==============================] - 41s 856us/sample - loss: 0.2318 - accuracy: 0.9303 - val_loss: 0.0929 - val_accuracy: 0.9744
Epoch 2/4
48000/48000 [==============================] - 44s 911us/sample - loss: 0.0692 - accuracy: 0.9794 - val_loss: 0.0642 - val_accuracy: 0.9825
Epoch 3/4
48000/48000 [==============================] - 44s 924us/sample - loss: 0.0470 - accuracy: 0.9852 - val_loss: 0.0626 - val_accuracy: 0.9817
Epoch 4/4
48000/48000 [==============================] - 45s 944us/sample - loss: 0.0346 - accuracy: 0.9892 - val_loss: 0.0614 - val_accuracy: 0.9822
Out[17]:
<tensorflow.python.keras.callbacks.History at 0x7fca6ca96da0>
In [18]:
score2 = cnv_model.evaluate(x_test_i.reshape(x_test_i.shape[0],img_rows,img_cols,1),
                       y_test, verbose=0)
print('Test loss:', score2[0])
print('Test accuracy:', score2[1])
Test loss: 0.0518858770695515
Test accuracy: 0.9843
In [ ]: